Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 7.163
Filtrar
1.
Zootaxa ; 5406(2): 238-252, 2024 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-38480154

RESUMO

Eupyrochroa Blair, 1914 is a small genus of fire-colored beetles (Coleoptera: Pyrochroidae) with two putative species recorded from limited historical distributions in China. The two species, E. insignita (Fairmaire, 1894) and E. limbaticollis (Pic, 1909), have been distinguished on the basis of color differences in the pronotum and scutellum, characters now known to exhibit significant variability. In the present study, adult morphology of the two species was compared, and partial fragments of cytochrome c oxidase subunit I (COI) from 36 samples representing 14 pyrochroid species were obtained by extraction and a GenBank search. Nucleotide composition, genetic distance, and phylogeny were analyzed. The results of morphological and molecular analyses indicate consistency, suggesting that the two species are indistinguishable by any significant measure. Therefore, Eupyrochroa limbaticollis (Pic, 1909) is proposed as a junior synonym of E. insignita (Fairmaire, 1894). The species is also redescribed and illustrated, including both adults and larvae.


Assuntos
Besouros , Animais , Filogenia , Larva , Bases de Dados de Ácidos Nucleicos
2.
J Microbiol Methods ; 220: 106921, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38494090

RESUMO

Bacteria are primarily responsible for biological water treatment processes in constructed wetland systems. Gravel in constructed wetlands serves as an essential substrate onto which complex bacterial biofilms may successfully grow and evolve. To fully understand the bacterial community in these systems it is crucial to properly isolate biofilms and process DNA from such substrates. This study looked at how best to isolate bacterial biofilms from gravel substrates in terms of bacterial richness. It considered factors including the duration of agitation during extraction, extraction temperature, and enzyme usage. Further, the 16S taxonomy data subsequently produced from Illumina MiSeq reads (using the SILVA 132 ribosomal RNA (rRNA) database on the DADA2 pipeline) were compared with the 16S data produced from Oxford Nanopore Technologies (ONT) MinION reads (using the NCBI 16S database on the EPI2ME pipeline). Finally, performance was tested by comparing the taxonomy data generated from the Illumina MiSeq and ONT MinION reads using the same (SILVA 132) database. We found no significant differences in the effective number of species observed when using different bacterial biofilm detachment techniques. However, enzyme treatment enhanced the total concentration of DNA. In terms of wetland community profiles, relative abundance differences within each sample type were clearer at the genus level. For genus-level taxonomic classification, MinION sequencing with the EPI2ME pipeline (NCBI database) produced bacterial abundance information that was poorly correlated with that from the Illumina MiSeq and DADA2 pipelines (SILVA132 database). When using the same database for each sequencing technology (SILVA132), the correlation between relative abundances at genus-level improved from negligible to moderate. This study provides detailed information of value to researchers working on constructed wetlands regarding efficient biofilm detachment techniques for DNA isolation and 16 s metabarcoding platforms for sequencing and data analysis.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , RNA Ribossômico 16S/genética , Genes de RNAr , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Bactérias/genética
3.
Comput Biol Med ; 172: 108256, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38489989

RESUMO

Sepsis, a life-threatening condition triggered by the body's response to infection, presents a significant global healthcare challenge characterized by disarrayed host responses, widespread inflammation, organ impairment, and heightened mortality rates. This study introduces the ncRS database (http://www.ncrdb.cn), a meticulously curated repository housing 1144 experimentally validated non-coding RNAs (ncRNAs) intricately linked with sepsis. ncRS offers comprehensive RNA data, exhaustive experimental insights, and integrated annotations from diverse databases. This resource empowers researchers and clinicians to decipher ncRNAs' roles in sepsis pathogenesis, potentially identifying vital biomarkers for early diagnosis and prognosis, thus facilitating personalized treatments.


Assuntos
RNA não Traduzido , Sepse , Humanos , RNA não Traduzido/genética , Bases de Dados de Ácidos Nucleicos , Biomarcadores , Sepse/genética
4.
Front Immunol ; 15: 1267963, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38464509

RESUMO

Background: Coronary artery disease (CAD) and type 2 diabetes mellitus (T2DM) are closely related. The function of immunocytes in the pathogenesis of CAD and T2DM has not been extensively studied. The quantitative bioinformatics analysis of the public RNA sequencing database was applied to study the key genes that mediate both CAD and T2DM. The biological characteristics of associated key genes and mechanism of CD8+ T and NK cells in CAD and T2DM are our research focus. Methods: With expression profiles of GSE66360 and GSE78721 from the Gene Expression Omnibus (GEO) database, we identified core modules associated with gene co-expression relationships and up-regulated genes in CAD and T2DM using Weighted Gene Co-expression Network Analysis (WGCNA) and the 'limma' software package. The enriched pathways of the candidate hub genes were then explored using GO, KEGG and GSEA in conjunction with the immune gene set (from the MSigDB database). A diagnostic model was constructed using logistic regression analysis composed of candidate hub genes in CAD and T2DM. Univariate Cox regression analysis revealed hazard ratios (HRs), 95% confidence intervals (CIs), and p-values for candidate hub genes in diagnostic model, while CIBERSORT and immune infiltration were used to assess the immune microenvironment. Finally, monocytes from peripheral blood samples and their immune cell ratios were analyzed by flow cytometry to validate our findings. Results: Sixteen candidate hub genes were identified as being correlated with immune infiltration. Univariate Cox regression analysis revealed that NPEPPS and ABHD17A were highly correlated with the diagnosis of CAD and T2DM. The results indicate that CD8+ T cells (p = 0.04) and NKbright cells (p = 3.7e-3) are significantly higher in healthy controls than in individuals with CAD or CAD combined with T2DM. The bioinformatics results on immune infiltration were well validated by flow cytometry. Conclusions: A series of bioinformatics studies have shown ABHD17A and NPEPPS as key genes for the co-occurrence of CAD and T2DM. Our study highlights the important effect of CD8+ T and NK cells in the pathogenesis of both diseases, indicating that they may serve as viable targets for diagnosis and therapeutic intervention.


Assuntos
Doença da Artéria Coronariana , Diabetes Mellitus Tipo 2 , Humanos , Doença da Artéria Coronariana/genética , Diabetes Mellitus Tipo 2/genética , Regulação para Cima , Linfócitos T CD8-Positivos , Células Matadoras Naturais , Bases de Dados de Ácidos Nucleicos
5.
Genome Biol ; 25(1): 60, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38409096

RESUMO

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma , Software
6.
mSystems ; 9(3): e0110523, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38376167

RESUMO

Understanding the ecological impacts of viruses on natural and engineered ecosystems relies on the accurate identification of viral sequences from community sequencing data. To maximize viral recovery from metagenomes, researchers frequently combine viral identification tools. However, the effectiveness of this strategy is unknown. Here, we benchmarked combinations of six widely used informatics tools for viral identification and analysis (VirSorter, VirSorter2, VIBRANT, DeepVirFinder, CheckV, and Kaiju), called "rulesets." Rulesets were tested against mock metagenomes composed of taxonomically diverse sequence types and diverse aquatic metagenomes to assess the effects of the degree of viral enrichment and habitat on tool performance. We found that six rulesets achieved equivalent accuracy [Matthews Correlation Coefficient (MCC) = 0.77, Padj ≥ 0.05]. Each contained VirSorter2, and five used our "tuning removal" rule designed to remove non-viral contamination. While DeepVirFinder, VIBRANT, and VirSorter were each found once in these high-accuracy rulesets, they were not found in combination with each other: combining tools does not lead to optimal performance. Our validation suggests that the MCC plateau at 0.77 is partly caused by inaccurate labeling within reference sequence databases. In aquatic metagenomes, our highest MCC ruleset identified more viral sequences in virus-enriched (44%-46%) than in cellular metagenomes (7%-19%). While improved algorithms may lead to more accurate viral identification tools, this should be done in tandem with careful curation of sequence databases. We recommend using the VirSorter2 ruleset and our empirically derived tuning removal rule. Our analysis provides insight into methods for in silico viral identification and will enable more robust viral identification from metagenomic data sets. IMPORTANCE: The identification of viruses from environmental metagenomes using informatics tools has offered critical insights in microbial ecology. However, it remains difficult for researchers to know which tools optimize viral recovery for their specific study. In an attempt to recover more viruses, studies are increasingly combining the outputs from multiple tools without validating this approach. After benchmarking combinations of six viral identification tools against mock metagenomes and environmental samples, we found that these tools should only be combined cautiously. Two to four tool combinations maximized viral recovery and minimized non-viral contamination compared with either the single-tool or the five- to six-tool ones. By providing a rigorous overview of the behavior of in silico viral identification strategies and a pipeline to replicate our process, our findings guide the use of existing viral identification tools and offer a blueprint for feature engineering of new tools that will lead to higher-confidence viral discovery in microbiome studies.


Assuntos
Benchmarking , Vírus , Ecossistema , Metagenômica/métodos , Bases de Dados de Ácidos Nucleicos
7.
Nat Comput Sci ; 4(2): 104-109, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38413777

RESUMO

Public sequencing databases contain vast amounts of biological information, yet they are largely underutilized as it is challenging to efficiently search them for any sequence(s) of interest. We present kmindex, an approach that can index thousands of metagenomes and perform sequence searches in a fraction of a second. The index construction is an order of magnitude faster than previous methods, while search times are two orders of magnitude faster. With negligible false positive rates below 0.01%, kmindex outperforms the precision of existing approaches by four orders of magnitude. Here we demonstrate the scalability of kmindex by successfully indexing 1,393 marine seawater metagenome samples from the Tara Oceans project. Additionally, we introduce the publicly accessible web server Ocean Read Atlas, which enables real-time queries on the Tara Oceans dataset.


Assuntos
Genômica , Água do Mar , Oceanos e Mares , Metagenoma/genética , Bases de Dados de Ácidos Nucleicos
8.
Nucleic Acids Res ; 52(4): 1628-1644, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38261968

RESUMO

A growing body of evidence indicates an important role of miRNAs in cancer; however, there is no definitive, convenient-to-use list of cancer-related miRNAs or miRNA genes that may serve as a reference for analyses of miRNAs in cancer. To this end, we created a list of 165 cancer-related miRNA genes called the Cancer miRNA Census (CMC). The list is based on a score, built on various types of functional and genetic evidence for the role of particular miRNAs in cancer, e.g. miRNA-cancer associations reported in databases, associations of miRNAs with cancer hallmarks, or signals of positive selection of genetic alterations in cancer. The presence of well-recognized cancer-related miRNA genes, such as MIR21, MIR155, MIR15A, MIR17 or MIRLET7s, at the top of the CMC ranking directly confirms the accuracy and robustness of the list. Additionally, to verify and indicate the reliability of CMC, we performed a validation of criteria used to build CMC, comparison of CMC with various cancer data (publications and databases), and enrichment analyses of biological pathways and processes such as Gene Ontology or DisGeNET. All validation steps showed a strong association of CMC with cancer/cancer-related processes confirming its usefulness as a reference list of miRNA genes associated with cancer.


Assuntos
Bases de Dados de Ácidos Nucleicos , MicroRNAs , Neoplasias , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Neoplasias/genética , Reprodutibilidade dos Testes
9.
BMC Res Notes ; 17(1): 35, 2024 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-38268047

RESUMO

OBJECTIVE: A reliable taxonomic identification of species from molecular samples is the first step for many studies. For researchers unfamiliar with programming, running a BLAST analysis, filtering, and organizing results for hundreds of sequences through the BLAST web interface can be difficult. Additionally, sequences deposited in GenBank can have outdated taxonomic identification. The use of reliable Reference Sequences Library (RSL) containing accurate taxonomically-identified sequences facilitates this task. Pending the availability of a RSL with the user, we developed a tool that automates the molecular taxonomic identification of sequences. RESULTS: We developed PARSID, a Python script running through the command-line that automates the routine workflow of blasting an input sequence file against the user's RSL, and retrieves the matches with the highest percentage of identity in five steps. PARSID accepts cut-off parameters and supplementary information in a.csv file for filtering the results. The final output is visualized in a spreadsheet. We tested its functioning using 10 input sequences simulating different situations of the molecular taxonomic identification of sequences against an example RSL containing 25 sequences. Step-by-step instructions and test files are publicly available at https://github.com/kokinide/PARSID.git .


Assuntos
Bases de Dados de Ácidos Nucleicos , Publicações , Humanos , Biblioteca Gênica , Pesquisadores , Fluxo de Trabalho
10.
Infect Genet Evol ; 118: 105557, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38244748

RESUMO

Human infections with Rocahepevirus ratti genotype C1 (HEV-C1) in Hong Kong of China, Canada, Spain, and France have drawn worldwide concern towards Rocahepevirus. This study conducted a global genetic analysis of Rocahepevirus, aiming to furnish comprehensive molecular insights and promote further research. We retrieved 817 Rocahepevirus sequences from the GenBank database through October 31, 2023, categorizing them according to research, sample collection area and date, genotype, host, and sequence length. Subsequently, we conducted descriptive epidemiological, phylogenetic evolutionary, and protein polymorphism (in length and identity) analyses on these sequences. Rocahepevirus genomes were identified across twenty-eight countries, predominantly in Asia (71.73%, 586/817) and Europe (26.44%, 216/817). The HEV-C1 dominates Rocahepevirus (77.2%, 631/817), while newly discovered Rocahepevirus genotypes (C3/C4/C5 and other unclassified genotypes) were primarily identified in Europe (25/120) and China (91/120). Muridae animals (72.5%, 592/817) serve as the primary hosts for Rocahepevirus, with other hosts encompassing species from the families Soricidae, Hominidae, Mustelidae, and Cricetidae. Additionally, Rocahepevirus genomes (C1 genotype) were identified in sewage samples recently. The phylogenetic evolution of Rocahepevirus exhibits considerable variation. Specifically, HEV-C1 can be classified into at least six genetic groups (G1 to G6), with human HEV-C1 distributed across multiple evolutionary clades. The overall ORF1 and ORF2 amino acid sequence lengths were significantly different (P < 0.001) across Rocahepevirus genotypes. HEV-C1/C2/C3 and HEV-C4/C5 displayed substantial differences in amino acid sequence identity (58.4%-59.6%). The identification of Rocahepevirus genomes has expanded across numerous countries, particularly in European and Asian countries, coinciding with an expanding host range and emergence of new genotypes. The evolutionary path of Rocahepevirus is intricate, where the HEV-C1 dominates globally and internally forms multiple evolutionary groups (G1 to G6), exhibiting diverse genetic variation within human HEV-C1. Significant differences exist in the protein polymorphism (in length and identity) across Rocahepevirus genotypes. Given Rocahepevirus's shift from an animal virus to a zoonotic pathogen, worldwide cooperation in monitoring Rocahepevirus genomes is vital.


Assuntos
Mustelidae , Vírus , Humanos , Animais , Filogenia , Epidemiologia Molecular , Arvicolinae , Bases de Dados de Ácidos Nucleicos , Hong Kong , Muridae
11.
Vet Parasitol Reg Stud Reports ; 47: 100962, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38199700

RESUMO

This study reports the infection and diagnosis of the protozoan morphologic complex Trichomonas gallinae in a baby red-breasted toucan (Ramphastos dicolorus). Nodular lesions on the soft palate and edema in the oral cavity were observed macroscopically. Microscopically, a granuloma with multiple layers of necrosis interspersed with inflammatory polymorphonuclear infiltrates was observed. Parasitism was confirmed by parasitological diagnosis, isolation of the flagellates in culture medium, and Polymerase Chain Reaction (PCR) using 5.8S ribosomal RNA (rRNA). Flanking internal transcribed spacer (ITS) gene regions were amplified by polymerase chain reaction, and the sequences were analyzed phylogenetically using MEGA 11 software. Phylogenetic analysis based on ITS1/5.8S rRNA/ITS2 sequences demonstrated high nucleotide identity with two Trichomonas sequences available in GenBank, which were more closely related to T. vaginalis (99%) than to T. gallinae (98%). In addition to being potential transmitters of this protozoan, rigorous monitoring of infectious and parasitic diseases in wild bird populations is essential for their preservation. The forms of transmission of Trichomonas sp. favor the occurrence of the disease in many non-Columbiformes species, which is essential for the monitoring of this disease in wild birds.


Assuntos
Tricomoníase , Trichomonas , Animais , Filogenia , Tricomoníase/diagnóstico , Tricomoníase/veterinária , Trichomonas/genética , Aves , Bases de Dados de Ácidos Nucleicos
12.
Database (Oxford) ; 20242024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38284937

RESUMO

Insect decline has become a growing concern in recent years, with studies showing alarming declines in populations of several taxa. Our knowledge about genetic spatial patterns and evolutionary history of insects still exhibits significant gaps hindering our ability to effectively conserve and manage insect populations and species. Genetic data may provide valuable insights into the diversity and the evolutionary relationships of insects' species and populations. Public repositories, such as GenBank and BOLD, containing vast archives of genetic data with associated metadata, offer an irreplaceable resource for researchers contributing to our understanding of species diversity, population structure and evolutionary relationships. However, there are some issues in using these data, as they are often scattered and may lack accuracy due to inconsistent sampling protocols and incomplete information. In this paper we describe a curated georeferenced database of genetic data collected in GenBank and BOLD, for insects listed in the International Union for Conservation of Nature (IUCN) Italian Red Lists (dragonflies, bees, saproxylic beetles and butterflies). After querying these repositories, we performed quality control and data standardization steps. We created a dataset containing approximately 33 000 mitochondrial sequences and associated metadata about taxonomy, collection localities, geographic coordinates and IUCN Red List status for 1466 species across the four insect lists. We describe the current state of geographical metadata in queried repositories for species listed under different conservation status in the Italian Red Lists to quantify data gaps posing barriers to prioritization of conservation actions. Our curated dataset is available for data repurposing and analysis, enabling researchers to conduct comparative studies. We emphasize the importance of filling knowledge gaps in insect diversity and distribution and highlight the potential of this dataset for promoting other research fields like phylogeography, macrogenetics and conservation strategies. Our database can be downloaded through the Zenodo repository in SQL format. Database URL:  https://zenodo.org/records/8375181.


Assuntos
Borboletas , Odonatos , Abelhas , Animais , Humanos , Insetos/genética , Bases de Dados de Ácidos Nucleicos , Geografia
13.
Nucleic Acids Res ; 52(D1): D52-D60, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37739414

RESUMO

Recent studies have demonstrated the important regulatory role of circRNAs, but an in-depth understanding of the comprehensive landscape of circRNAs across various species still remains unexplored. The current circRNA databases are often species-restricted or based on outdated datasets. To address this challenge, we have developed the circAtlas 3.0 database, which contains a rich collection of 2674 circRNA sequencing datasets, curated to delineate the landscape of circRNAs within 33 distinct tissues spanning 10 vertebrate species. Notably, circAtlas 3.0 represents a substantial advancement over its precursor, circAtlas 2.0, with the number of cataloged circRNAs escalating from 1 007 087 to 3 179 560, with 2 527 528 of them being reconstructed into full-length isoforms. circAtlas 3.0 also introduces several notable enhancements, including: (i) integration of both Illumina and Nanopore sequencing datasets to detect circRNAs of extended lengths; (ii) employment of a standardized nomenclature scheme for circRNAs, providing information of the host gene and full-length circular exons; (iii) inclusion of clinical cancer samples to explore the biological function of circRNAs within the context of cancer and (iv) links to other useful resources to enable user-friendly analysis of target circRNAs. The updated circAtlas 3.0 provides an important platform for exploring the evolution and biological implications of vertebrate circRNAs, and is freely available at http://circatlas.biols.ac.cn and https://ngdc.cncb.ac.cn/circatlas.


Assuntos
Bases de Dados de Ácidos Nucleicos , Neoplasias , RNA Circular , Animais , Humanos , Neoplasias/genética , Vertebrados/genética
14.
Nucleic Acids Res ; 52(D1): D115-D123, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37823705

RESUMO

Circular RNAs (circRNAs) are RNA molecules with a continuous loop structure characterized by back-splice junctions (BSJs). While analyses of short-read RNA sequencing have identified millions of BSJ events, it is inherently challenging to determine exact full-length sequences and alternatively spliced (AS) isoforms of circRNAs. Recent advances in nanopore long-read sequencing with circRNA enrichment bring an unprecedented opportunity for investigating the issues. Here, we developed FL-circAS (https://cosbi.ee.ncku.edu.tw/FL-circAS/), which collected such long-read sequencing data of 20 cell lines/tissues and thereby identified 884 636 BSJs with 1 853 692 full-length circRNA isoforms in human and 115 173 BSJs with 135 617 full-length circRNA isoforms in mouse. FL-circAS also provides multiple circRNA features. For circRNA expression, FL-circAS calculates expression levels for each circRNA isoform, cell line/tissue specificity at both the BSJ and isoform levels, and AS entropy for each BSJ across samples. For circRNA biogenesis, FL-circAS identifies reverse complementary sequences and RNA binding protein (RBP) binding sites residing in flanking sequences of BSJs. For functional patterns, FL-circAS identifies potential microRNA/RBP binding sites and several types of evidence for circRNA translation on each full-length circRNA isoform. FL-circAS provides user-friendly interfaces for browsing, searching, analyzing, and downloading data, serving as the first resource for discovering full-length circRNAs at the isoform level.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Circular , Animais , Humanos , Camundongos , Processamento Alternativo/genética , MicroRNAs/genética , MicroRNAs/metabolismo , Sequenciamento por Nanoporos , RNA Circular/genética , Isoformas de RNA/genética
15.
Nucleic Acids Res ; 52(D1): D134-D137, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37889039

RESUMO

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 25 trillion base pairs from over 3.7 billion nucleotide sequences for 557 000 formally described species. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. Recent updates include policies for including spatio-temporal metadata, clarified documentation for GenBank data processing, enhanced foreign contamination screening tools, new processes in the Submission Portal, migration of Entrez Genome and Assembly displays into NCBI Datasets, and the impending retirement of tbl2asn, replaced by table2asn.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Sequência de Bases , Internet , Humanos
16.
Nucleic Acids Res ; 52(D1): D351-D359, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37904593

RESUMO

A growing interest in aptamer research, as evidenced by the increase in aptamer publications over the years, has led to calls for a go-to site for aptamer information. A comprehensive, publicly available aptamer dataset, which may be a repository for aptamer data, standardize aptamer reporting, and generate opportunities to expand current research in the field, could meet such a demand. There have been several attempts to create aptamer databases; however, most have been abandoned or removed entirely from public view. Inspired by previous efforts, we have published the UTexas Aptamer Database, https://sites.utexas.edu/aptamerdatabase, which includes a publicly available aptamer dataset and a searchable database containing a subset of all aptamer data collected to date (1990-2022). The dataset contains aptamer sequences, binding and selection information. The information is regularly reviewed internally to ensure accuracy and consistency across all entries. To support the continued curation and review of aptamer sequence information, we have implemented sustaining mechanisms, including researcher training protocols, an aptamer submission form, data stored separately from the database platform, and a growing team of researchers committed to updating the database. Currently, the UTexas Aptamer Database is the largest in terms of the number of aptamer sequences with 1,443 internally reviewed aptamer records.


Assuntos
Aptâmeros de Nucleotídeos , Bases de Dados de Ácidos Nucleicos , Conjuntos de Dados como Assunto
17.
Nucleic Acids Res ; 52(D1): D98-D106, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953349

RESUMO

Long noncoding RNAs (lncRNAs) have emerged as crucial regulators across diverse biological processes and diseases. While high-throughput sequencing has enabled lncRNA discovery, functional characterization remains limited. The EVLncRNAs database is the first and exclusive repository for all experimentally validated functional lncRNAs from various species. After previous releases in 2018 and 2021, this update marks a major expansion through exhaustive manual curation of nearly 25 000 publications from 15 May 2020, to 15 May 2023. It incorporates substantial growth across all categories: a 154% increase in functional lncRNAs, 160% in associated diseases, 186% in lncRNA-disease associations, 235% in interactions, 138% in structures, 234% in circular RNAs, 235% in resistant lncRNAs and 4724% in exosomal lncRNAs. More importantly, it incorporated additional information include functional classifications, detailed interaction pathways, homologous lncRNAs, lncRNA locations, COVID-19, phase-separation and organoid-related lncRNAs. The web interface was substantially improved for browsing, visualization, and searching. ChatGPT was tested for information extraction and functional overview with its limitation noted. EVLncRNAs 3.0 represents the most extensive curated resource of experimentally validated functional lncRNAs and will serve as an indispensable platform for unravelling emerging lncRNA functions. The updated database is freely available at https://www.sdklab-biophysics-dzu.net/EVLncRNAs3/.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Longo não Codificante , Gerenciamento de Dados , Armazenamento e Recuperação da Informação , RNA Longo não Codificante/genética
18.
Nucleic Acids Res ; 52(D1): D791-D797, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953409

RESUMO

UNITE (https://unite.ut.ee) is a web-based database and sequence management environment for molecular identification of eukaryotes. It targets the nuclear ribosomal internal transcribed spacer (ITS) region and offers nearly 10 million such sequences for reference. These are clustered into ∼2.4M species hypotheses (SHs), each assigned a unique digital object identifier (DOI) to promote unambiguous referencing across studies. UNITE users have contributed over 600 000 third-party sequence annotations, which are shared with a range of databases and other community resources. Recent improvements facilitate the detection of cross-kingdom biological associations and the integration of undescribed groups of organisms into everyday biological pursuits. Serving as a digital twin for eukaryotic biodiversity and communities worldwide, the latest release of UNITE offers improved avenues for biodiversity discovery, precise taxonomic communication and integration of biological knowledge across platforms.


Assuntos
Bases de Dados de Ácidos Nucleicos , Fungos , DNA Espaçador Ribossômico , Fungos/genética , Biodiversidade , DNA Fúngico , Filogenia
19.
Nucleic Acids Res ; 52(D1): D67-D71, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971299

RESUMO

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.


Assuntos
Bases de Dados de Ácidos Nucleicos , Metabolômica , Metadados , Humanos , Biologia Computacional , Genômica , Internet , Japão , Multiômica/métodos
20.
Nucleic Acids Res ; 52(D1): D92-D97, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37956313

RESUMO

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) is maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). The ENA is one of the three members of the International Nucleotide Sequence Database Collaboration (INSDC). It serves the bioinformatics community worldwide via the submission, processing, archiving and dissemination of sequence data. The ENA supports data types ranging from raw reads, through alignments and assemblies to functional annotation. The data is enriched with contextual information relating to samples and experimental configurations. In this article, we describe recent progress and improvements to ENA services. In particular, we focus upon three areas of work in 2023: FAIRness of ENA data, pandemic preparedness and foundational technology. For FAIRness, we have introduced minimal requirements for spatiotemporal annotation, created a metadata-based classification system, incorporated third party metadata curations with archived records, and developed a new rapid visualisation platform, the ENA Notebooks. For foundational enhancements, we have improved the INSDC data exchange and synchronisation pipelines, and invested in site reliability engineering for ENA infrastructure. In order to support genomic surveillance efforts, we have continued to provide ENA services in support of SARS-CoV-2 data mobilisation and have adapted these for broader pathogen surveillance efforts.


Assuntos
Genômica , Nucleotídeos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Internet , Reprodutibilidade dos Testes , Europa (Continente)
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...